[Misc] Fix Current vLLM config is not set. warnings, assert to avoid issues in the future#31747
Conversation
There was a problem hiding this comment.
Code Review
This pull request improves the robustness of the vLLM configuration handling by replacing a warning with an assertion when the configuration is not set. This helps to catch potential bugs early. An environment variable VLLM_ALLOW_DEFAULT_CONFIG is introduced to maintain the old behavior for testing purposes, which is a thoughtful addition.
The other changes in the pull request are logical consequences of this stricter configuration check. The use of a lazy dictionary in vllm/model_executor/layers/fused_moe/cpu_fused_moe.py correctly defers the instantiation of CustomOp subclasses until they are needed, avoiding errors during module import. Similarly, caching the GroupedTopk instance in vllm/model_executor/layers/fused_moe/layer.py is a good optimization that also resolves the configuration access issue.
Overall, the changes are well-implemented, improve code quality, and I have no concerns.
|
Hi @LucasWilkinson, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
ProExpertProg
left a comment
There was a problem hiding this comment.
Thanks for addressing this!
|
Hi @LucasWilkinson, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
1 similar comment
|
Hi @LucasWilkinson, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
ProExpertProg
left a comment
There was a problem hiding this comment.
Just a nit for the error message, thanks for this cleanup!
|
Hi @LucasWilkinson, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
ff497f1 to
54b1339
Compare
…d issues in the future (vllm-project#31747) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
Thanks for fixing this! Really appreciate it |
…d issues in the future (vllm-project#31747) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
…d issues in the future (vllm-project#31747) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…d issues in the future (vllm-project#31747) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
vLLM v0.14.0 (vllm-project/vllm#31747) changed get_current_vllm_config() from a warning to a hard AssertionError when called outside a set_current_vllm_config() context. This broke all vLLM collectors on v0.17.0 (1793 errors in pipeline 47590887). Three fixes: - utils.py: wrap ensure_model_parallel_initialized() in config context inside setup_distributed() (fixes 1559 errors across gemm, mla, dsa) - collect_moe.py: pass dp_size=1 to FusedMoE() constructor (fixes 32 mxfp4 errors where vLLM tried to query uninitialized DP group) - collect_moe.py: add is_gated_activation=True to prepare_static_weights_for_trtllm_fp4_moe() (fixes 202 nvfp4 errors from new required arg in vLLM v0.17.0) Also wraps all MoE benchmark runs in set_current_vllm_config() context, adds _nvfp4_available gate, and bumps __compat__ to vllm>=0.14.0. Signed-off-by: Simone Chen <simonec@nvidia.com>
vLLM v0.14.0 (vllm-project/vllm#31747) changed get_current_vllm_config() from a warning to a hard AssertionError when called outside a set_current_vllm_config() context. This broke all vLLM collectors on v0.17.0 (1793 errors in pipeline 47590887). Three fixes: - utils.py: wrap ensure_model_parallel_initialized() in config context inside setup_distributed() (fixes 1559 errors across gemm, mla, dsa) - collect_moe.py: pass dp_size=1 to FusedMoE() constructor (fixes 32 mxfp4 errors where vLLM tried to query uninitialized DP group) - collect_moe.py: add is_gated_activation=True to prepare_static_weights_for_trtllm_fp4_moe() (fixes 202 nvfp4 errors from new required arg in vLLM v0.17.0) Also wraps all MoE benchmark runs in set_current_vllm_config() context, adds _nvfp4_available gate, and bumps __compat__ to vllm>=0.14.0. Signed-off-by: Simone Chen <simonec@nvidia.com>
…0+) (#688) * fix: vLLM >= 0.14.0 collector compatibility for set_current_vllm_config vLLM v0.14.0 (vllm-project/vllm#31747) changed get_current_vllm_config() from a warning to a hard AssertionError when called outside a set_current_vllm_config() context. This broke all vLLM collectors on v0.17.0 (1793 errors in pipeline 47590887). Three fixes: - utils.py: wrap ensure_model_parallel_initialized() in config context inside setup_distributed() (fixes 1559 errors across gemm, mla, dsa) - collect_moe.py: pass dp_size=1 to FusedMoE() constructor (fixes 32 mxfp4 errors where vLLM tried to query uninitialized DP group) - collect_moe.py: add is_gated_activation=True to prepare_static_weights_for_trtllm_fp4_moe() (fixes 202 nvfp4 errors from new required arg in vLLM v0.17.0) Also wraps all MoE benchmark runs in set_current_vllm_config() context, adds _nvfp4_available gate, and bumps __compat__ to vllm>=0.14.0. Signed-off-by: Simone Chen <simonec@nvidia.com> * feat: add nvidia/MiniMax-M2.5-NVFP4 to collector MoE test cases Signed-off-by: Simone Chen <simonec@nvidia.com> * fix: wrap vLLM all_reduce collector initialize_model_parallel in config context collect_all_reduce.py has its own setup_vllm_distributed() that calls initialize_model_parallel() without set_current_vllm_config() context, causing AssertionError on all ranks with vLLM >= 0.14.0. Signed-off-by: Simone Chen <simonec@nvidia.com> --------- Signed-off-by: Simone Chen <simonec@nvidia.com>
* fix: vLLM >= 0.14.0 collector compatibility for set_current_vllm_config vLLM v0.14.0 (vllm-project/vllm#31747) changed get_current_vllm_config() from a warning to a hard AssertionError when called outside a set_current_vllm_config() context. This broke all vLLM collectors on v0.17.0 (1793 errors in pipeline 47590887). Three fixes: - utils.py: wrap ensure_model_parallel_initialized() in config context inside setup_distributed() (fixes 1559 errors across gemm, mla, dsa) - collect_moe.py: pass dp_size=1 to FusedMoE() constructor (fixes 32 mxfp4 errors where vLLM tried to query uninitialized DP group) - collect_moe.py: add is_gated_activation=True to prepare_static_weights_for_trtllm_fp4_moe() (fixes 202 nvfp4 errors from new required arg in vLLM v0.17.0) Also wraps all MoE benchmark runs in set_current_vllm_config() context, adds _nvfp4_available gate, and bumps __compat__ to vllm>=0.14.0. Signed-off-by: Simone Chen <simonec@nvidia.com> * feat: add nvidia/MiniMax-M2.5-NVFP4 to collector MoE test cases Signed-off-by: Simone Chen <simonec@nvidia.com> * fix: wrap vLLM all_reduce collector initialize_model_parallel in config context collect_all_reduce.py has its own setup_vllm_distributed() that calls initialize_model_parallel() without set_current_vllm_config() context, causing AssertionError on all ranks with vLLM >= 0.14.0. Signed-off-by: Simone Chen <simonec@nvidia.com> * fix: resolve MLA module collector model paths locally to avoid HF downloads collect_mla_module.py passes model names (e.g. "deepseek-ai/DeepSeek-V3") to vLLM's ModelConfig which calls AutoConfig.from_pretrained(), requiring network access to HuggingFace Hub. In CI containers without internet this causes OSError for all test cases (4210 errors in pipeline 292175043). The HF configs already exist in src/aiconfigurator/model_configs/. Add _resolve_model_path() that creates a temp directory with a symlink to the local config.json, so vLLM loads from disk instead of downloading. Signed-off-by: Simone Chen <simonec@nvidia.com> --------- Signed-off-by: Simone Chen <simonec@nvidia.com>
#30531 and #29575 introduced accesses to
get_current_vllm_configon boot that are outside ofset_current_vllm_configcontexts leading to repeated logswhen
get_current_vllm_configfalls back to creating a default config. This also lead to slight config propagation bugs as some custom ops would see the default config instead of the real one.This PR fixes those accesses and makes it an assert to try to avoid this in the future.
Longer term we should consider something like: #30859
TODO: get CI green with targeted additions of
VLLM_ALLOW_DEFAULT_CONFIG